19 research outputs found
Enumeration Complexity of Conjunctive Queries with Functional Dependencies
We study the complexity of enumerating the answers of Conjunctive Queries (CQs) in the presence of Functional Dependencies (FDs). Our focus is on the ability to list output tuples with a constant delay in between, following a linear-time preprocessing. A known dichotomy classifies the acyclic self-join-free CQs into those that admit such enumeration, and those that do not. However, this classification no longer holds in the common case where the database exhibits dependencies among attributes. That is, some queries that are classified as hard are in fact tractable if dependencies are accounted for. We establish a generalization of the dichotomy to accommodate FDs; hence, our classification determines which combination of a CQ and a set of FDs admits constant-delay enumeration with a linear-time preprocessing.
In addition, we generalize a hardness result for cyclic CQs to accommodate a common type of FDs. Further conclusions of our development include a dichotomy for enumeration with linear delay, and a dichotomy for CQs with disequalities. Finally, we show that all our results apply to the known class of "cardinality dependencies" that generalize FDs (e.g., by stating an upper bound on the number of genres per movies, or friends per person)
On the Enumeration of all Minimal Triangulations
We present an algorithm that enumerates all the minimal triangulations of a
graph in incremental polynomial time. Consequently, we get an algorithm for
enumerating all the proper tree decompositions, in incremental polynomial time,
where "proper" means that the tree decomposition cannot be improved by removing
or splitting a bag
Unbalanced Triangle Detection and Enumeration Hardness for Unions of Conjunctive Queries
We study the enumeration of answers to Unions of Conjunctive Queries (UCQs)
with optimal time guarantees. More precisely, we wish to identify the queries
that can be solved with linear preprocessing time and constant delay. Despite
the basic nature of this problem, it was shown only recently that UCQs can be
solved within these time bounds if they admit free-connex union extensions,
even if all individual CQs in the union are intractable with respect to the
same complexity measure. Our goal is to understand whether there exist
additional tractable UCQs, not covered by the currently known algorithms. As a
first step, we show that some previously unclassified UCQs are hard using the
classic 3SUM hypothesis, via a known reduction from 3SUM to triangle listing in
graphs. As a second step, we identify a question about a variant of this graph
task which is unavoidable if we want to classify all self-join free UCQs: is it
possible to decide the existence of a triangle in a vertex-unbalanced
tripartite graph in linear time? We prove that this task is equivalent in
hardness to some family of UCQs. Finally, we show a dichotomy for unions of two
self-join-free CQs if we assume the answer to this question is negative. Our
conclusion is that, to reason about a class of enumeration problems defined by
UCQs, it is enough to study the single decision problem of detecting triangles
in unbalanced graphs. Without a breakthrough for triangle detection, we have no
hope to find an efficient algorithm for additional unions of two self-join free
CQs. On the other hand, if we will one day have such a triangle detection
algorithm, we will immediately obtain an efficient algorithm for a family of
UCQs that are currently not known to be tractable
Tuple-Independent Representations of Infinite Probabilistic Databases
Probabilistic databases (PDBs) are probability spaces over database
instances. They provide a framework for handling uncertainty in databases, as
occurs due to data integration, noisy data, data from unreliable sources or
randomized processes. Most of the existing theory literature investigated
finite, tuple-independent PDBs (TI-PDBs) where the occurrences of tuples are
independent events. Only recently, Grohe and Lindner (PODS '19) introduced
independence assumptions for PDBs beyond the finite domain assumption. In the
finite, a major argument for discussing the theoretical properties of TI-PDBs
is that they can be used to represent any finite PDB via views. This is no
longer the case once the number of tuples is countably infinite. In this paper,
we systematically study the representability of infinite PDBs in terms of
TI-PDBs and the related block-independent disjoint PDBs.
The central question is which infinite PDBs are representable as first-order
views over tuple-independent PDBs. We give a necessary condition for the
representability of PDBs and provide a sufficient criterion for
representability in terms of the probability distribution of a PDB. With
various examples, we explore the limits of our criteria. We show that
conditioning on first order properties yields no additional power in terms of
expressivity. Finally, we discuss the relation between purely logical and
arithmetic reasons for (non-)representability
Database Repairing with Soft Functional Dependencies
A common interpretation of soft constraints penalizes the database for every violation of every constraint, where the penalty is the cost (weight) of the constraint. A computational challenge is that of finding an optimal subset: a collection of database tuples that minimizes the total penalty when each tuple has a cost of being excluded. When the constraints are strict (i.e., have an infinite cost), this subset is a "cardinality repair" of an inconsistent database; in soft interpretations, this subset corresponds to a "most probable world" of a probabilistic database, a "most likely intention" of a probabilistic unclean database, and so on. Within the class of functional dependencies, the complexity of finding a cardinality repair is thoroughly understood. Yet, very little is known about the complexity of finding an optimal subset for the more general soft semantics. This paper makes a significant progress in this direction. In addition to general insights about the hardness and approximability of the problem, we present algorithms for two special cases: a single functional dependency, and a bipartite matching. The latter is the problem of finding an optimal "almost matching" of a bipartite graph where a penalty is paid for every lost edge and every violation of monogamy
Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries
We study the question of when we can provide logarithmic-time direct access
to the k-th answer to a Conjunctive Query (CQ) with a specified ordering over
the answers, following a preprocessing step that constructs a data structure in
time quasilinear in the size of the database. Specifically, we embark on the
challenge of identifying the tractable answer orderings that allow for ranked
direct access with such complexity guarantees. We begin with lexicographic
orderings and give a decidable characterization (under conventional complexity
assumptions) of the class of tractable lexicographic orderings for every CQ
without self-joins. We then continue to the more general orderings by the sum
of attribute weights and show for it that ranked direct access is tractable
only in trivial cases. Hence, to better understand the computational challenge
at hand, we consider the more modest task of providing access to only a single
answer (i.e., finding the answer at a given position) - a task that we refer to
as the selection problem. We indeed achieve a quasilinear-time algorithm for a
subset of the class of full CQs without self-joins, by adopting a solution of
Frederickson and Johnson to the classic problem of selection over sorted
matrices. We further prove that none of the other queries in this class admit
such an algorithm.Comment: 17 page